\/script>', '
Quick start 1: How do I build *OmniPath* data with *pypath*?========================================================``pypath`` provides an easy way to build the OmniPath network as it has been described in our paper. At the first time this will take several minutes, because all data will be downloaded from the original providers. Next time pypath will use the data from its cache directory, so the network will build much faster. If you want to load it even faster, you can save it into a pickle dump.pypath provides an easy way to build the OmniPath network as it has been described in our paper. At the first time this will take several minutes, because all data will be downloaded from the original providers. Next time pypath will use the data from its cache directory, so the network will build much faster. If you want to load it even faster, you can save it into a pickle dump.
import pypathpa = pypath.PyPath()pa.load_omnipath()xxxxxxxxxxQuick start 2: I just want a network quickly and play around with *pypath*============================================================================You can find the predefined formats in the ``pypath.data_formats`` module. For example, to load one resource from there, let's say Signor:You can find the predefined formats in the pypath.data_formats module. For example, to load one resource from there, let's say Signor:
import pypathpa = pypath.PyPath()pa.load_resources({'signor': pypath.data_formats.pathway['signor']})Or to load all *process description* resources:Or to load all process description resources:
import pypathpa = pypath.PyPath()pa.init_network(pypath.data_formats.pathway)xxxxxxxxxxQuick start 3: How do I build networks from any data with *pypath*?=================================================================Here we show how to build a network from your own files. The advantage of building network with pypath is that you don't need to worry about merging redundant elements, neither about different formats and identifiers. Let's say you have two files with network data:**network1.csv** entrezA,entrezB,effect 1950,1956,inhibition 5290,207,stimulation 207,2932,inhibition 1956,5290,stimulation**network2.sif** EGF + EGFR EGFR + PIK3CA EGFR + SOS1 PIK3CA + RAC1 RAC1 + MAP3K1 SOS1 + HRAS HRAS + MAP3K1 PIK3CA + AKT1 AKT1 - GSK3B *Note: you need to create these files in order to load them.*1: Defining input formats-------------------------Here we show how to build a network from your own files. The advantage of building network with pypath is that you don't need to worry about merging redundant elements, neither about different formats and identifiers. Let's say you have two files with network data:
network1.csv
entrezA,entrezB,effect
1950,1956,inhibition
5290,207,stimulation
207,2932,inhibition
1956,5290,stimulationnetwork2.sif
EGF + EGFR
EGFR + PIK3CA
EGFR + SOS1
PIK3CA + RAC1
RAC1 + MAP3K1
SOS1 + HRAS
HRAS + MAP3K1
PIK3CA + AKT1
AKT1 - GSK3BNote: you need to create these files in order to load them.
import pypathimport pypath.input_formats as input_formatsinput1 = input_formats.ReadSettings( name = 'egf1', inFile = 'network1.csv', header = True, separator = ',', nameColA = 0, nameColB = 1, nameTypeA = 'entrez', nameTypeB = 'entrez', sign = (2, 'stimulation', 'inhibition'))input2 = input_formats.ReadSettings( name = 'egf2', inFile = 'network2.sif', separator = ' ', nameColA = 0, nameColB = 2, nameTypeA = 'genesymbol', nameTypeB = 'genesymbol', sign = (1, '+', '-'))2: Creating PyPath object and loading the 2 test files------------------------------------------------------inputs = { 'egf1': input1, 'egf2': input2}pa = pypath.PyPath()pa.reload()pa.init_network(lst = inputs)3: Plotting the network with igraph-----------------------------------import igraphplot = igraph.plot(pa.graph, target = 'egf_network.png', edge_width = 0.3, edge_color = '#777777', vertex_color = '#97BE73', vertex_frame_width = 0, vertex_size = 70.0, vertex_label_size = 15, vertex_label_color = '#FFFFFF', # due to a bug in either igraph or IPython, # vertex labels are not visible on inline plots: inline = False)from IPython.display import ImageImage(filename='egf_network.png')4: Querying the PyPath object-----------------------------This object offers many methods for analysing the network and also for integrating additional data. Let's see some examples. List those proteins stimulating the protein PIK3CA:This object offers many methods for analysing the network and also for integrating additional data. Let's see some examples. List those proteins stimulating the protein PIK3CA:
list(pa.gs_stimulated_by('PIK3CA').gs())And those stimulated by PIK3CA:And those stimulated by PIK3CA:
list(pa.gs_stimulates('PIK3CA').gs())In the background in order to run these queries, pypath converted the network to a directed graph:In the background in order to run these queries, pypath converted the network to a directed graph:
import sys# the original, undirected igraph object:sys.stdout.write('pa.graph is directed: %s\n' % pa.graph.is_directed())# the directed one:sys.stdout.write('pa.dgraph is directed: %s\n' % pa.dgraph.is_directed())from pypath import data_formatspa.load_resources(data_formats.pathway)xxxxxxxxxx6: Further attributes in the *PyPath* object------------------------------------------We just loaded pathway data from the pathway resources in OmniPath. The 2 small example networks are among the sources, the new ones have been added.We just loaded pathway data from the pathway resources in OmniPath. The 2 small example networks are among the sources, the new ones have been added.
pa.graph.es[ pa.get_edge('EGF', 'EGFR')]['sources']We can also see the directions and effects of this interaction:We can also see the directions and effects of this interaction:
x
print pa.graph.es[ pa.get_edge('EGF', 'EGFR')]['dirs']As we see, the first example network wrongly had EGF inhibiting EGFR.As we see, the first example network wrongly had EGF inhibiting EGFR.
But what is ``gs()``? It is a shorthand for ``genesymbol()``, facilitating querying the object by human readable names. Let's see what other stimulators 'PIK3CA' has now based on all the resources above.But what is gs()? It is a shorthand for genesymbol(), facilitating querying the object by human readable names. Let's see what other stimulators 'PIK3CA' has now based on all the resources above.
pa.dgraph = None # to remove the old directed graph and get a new oneprint list(pa.gs_stimulated_by('PIK3CA').gs())And its inhibitors:And its inhibitors:
list(pa.gs_inhibited_by('PIK3CA').gs())The ``gs_`` in front means we query by GeneSymbol, this returns a special ``VertexSeq`` object:The gs_ in front means we query by GeneSymbol, this returns a special VertexSeq object:
pa.gs_inhibited_by('PIK3CA')It's possible to get generators to iterate this vertex sequence, for example by GeneSymbols:It's possible to get generators to iterate this vertex sequence, for example by GeneSymbols:
pa.gs_inhibited_by('PIK3CA').gs()Or by UniProt IDs:Or by UniProt IDs:
list(pa.gs_inhibited_by('PIK3CA').up())Or by ``igraph.Vertex`` objects:Or by igraph.Vertex objects:
list(pa.gs_inhibited_by('PIK3CA').vs())``affects`` and ``affected_by`` methods query the molecules affected by or having an effect on the target respectively, either stimulation, inhibition or unknown effect:affects and affected_by methods query the molecules affected by or having an effect on the target respectively, either stimulation, inhibition or unknown effect:
print list(pa.gs_affects('EGFR').gs())Those proteins affecting EGFR; we see EGF among them:Those proteins affecting EGFR; we see EGF among them:
print list(pa.gs_affected_by('EGFR').gs())The direct neighbors, without being aware of the directions are retrieved by ``neighbors`` methods:The direct neighbors, without being aware of the directions are retrieved by neighbors methods:
print list(pa.gs_neighbors('EGFR').gs())The ``neighborhood`` methods return the indirect neighborhood in custom number of steps (however size of the neighborhood increases rapidly with number of steps):The neighborhood methods return the indirect neighborhood in custom number of steps (however size of the neighborhood increases rapidly with number of steps):
print list(pa.gs_neighborhood('EGFR').gs()) # 1 step by defaultprint ""print list(pa.gs_neighborhood('EGFR', 2).gs()) # 2 steps7: Accessing the literature references--------------------------------------References are listed in the ``references`` edge attribute:References are listed in the references edge attribute:
x
edge = pa.graph.es[ pa.get_edge('EGF', 'EGFR')]refs = edge['references']print refsEach reference has its PubMed ID:Each reference has its PubMed ID:
refs[0].pmidTo open in a browser just call ``open()``:To open in a browser just call open():
refs[0].open()It's possible to know which resources cite which papers:It's possible to know which resources cite which papers:
edge['refs_by_source']xxxxxxxxxx8: *Direction* objects--------------------Each edge stores its direction and effect sign data in one ``Direction`` object:Each edge stores its direction and effect sign data in one Direction object:
edge['dirs']Also there are ways to query the direction and effect information by resources:Also there are ways to query the direction and effect information by resources:
dirs = edge['dirs']dirs.get_dir(dirs.reverse, sources = True)``reverse`` means an arbitrary direction, noted as a tuple of UniProt IDs, where UniProt ID #1 affects UniProt ID #2; ``straight`` is its opposite, and ``undirected`` means unknown direction:reverse means an arbitrary direction, noted as a tuple of UniProt IDs, where UniProt ID #1 affects UniProt ID #2; straight is its opposite, and undirected means unknown direction:
print dirs.reverseprint dirs.straightBy default, ``get_dirs()`` returns boolean values:By default, get_dirs() returns boolean values:
dirs.get_dir(dirs.reverse)The same way effect signs can be queried. A pair of boolean values means if the interaction is stimulation and if it is inhibition, respectively.The same way effect signs can be queried. A pair of boolean values means if the interaction is stimulation and if it is inhibition, respectively.
dirs.get_sign(dirs.reverse)